Character-Level Interaction in Multimodal Computer-Assisted Transcription of Text Images
نویسندگان
چکیده
To date, automatic handwriting text recognition systems are far from being perfect and heavy human intervention is often required to check and correct the results of such systems. As an alternative, an interactive framework that integrates the human knowledge into the transcription process has been presented in previous works. In this work, multimodal interaction at character-level is studied. Until now, multimodal interaction had been studied only at whole-word level. However, character-level pen-stroke interactions may lead to more ergonomic and friendly interfaces. Empirical tests show that this approach can save significant amounts of user effort with respect to both fully manual transcription and non-interactive post-editing correction.
منابع مشابه
A Web-Based Demo to Interactive Multimodal Transcription of Historic Text Images
Paleography experts spend many hours transcribing historic documents, and state-of-the-art handwritten text recognition systems are not suitable for performing this task automatically. In this paper we present the modifications on a previously developed interactive framework for transcription of handwritten text. This system, rather than full automation, aimed at assisting the user with the rec...
متن کاملChallenges in Transcribing Multimodal Data: a Case Study
Computer-mediated communication (CMC) once meant principally text-based communication mediated by computers, but rapid technological advances in recent years have heralded an era of multimodal communication with a growing emphasis on audio and video synchronous interaction. As CMC, in all its variants (text chats, video chats, forums, blogs, SMS, etc.), has become normalized practice in persona...
متن کاملOrder embeddings and character-level convolutions for multimodal alignment
With the novel and fast advances in the area of deep neural networks, several challenging image-based tasks have been recently approached by researchers in pattern recognition and computer vision. In this paper, we address one of these tasks, which is to match image content with natural language descriptions, sometimes referred as multimodal content retrieval. Such a task is particularly challe...
متن کاملPreprocessing and Feature Extraction Techniques for Multimodal Interactive Transcription of Text Images
To date, automatic handwriting recognition systems are far from being perfect and heavy human intervention is often required to check and correct the results of such systems. This “post-editing” process is both inefficient and uncomfortable to the user. An example is the transcription of historic documents: State-of-the-art handwritten text recognition technology is not suitable to perform this...
متن کاملA Multimodal Data Mining Framework for Revealing Common Sources of Spam Images
This paper proposes a multimodal framework that clusters spam images so that ones from the same spam source/cluster are grouped together. By identifying the common sources of spam images, we can provide evidence in tracking spam gangs. For this purpose, text recognition and visual feature extraction are performed. Subsequently, a two-level clustering method is applied where images with visually...
متن کامل